Trainspotting: Extraction and Analysis of Traffic and Topologies of Transportation 

Networks 
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The knowledge of real-life traffic pattern is crucial for good understanding and analysis of trans- 
portation systems. This data is quite rare. In this paper we propose an algorithm for extracting 
both the real physical topology and the network of traffic flows from timetables of public mass trans- 
portation systems. We apply this algorithm to timetables of three large transportation networks. 
This enables us to make a systematic comparison between three different approaches to construct a 
graph representation of a transportation network; the resulting graphs are fundamentally different. 
We also find that the real-life traffic pattern is very heterogenous, both in space and traffic flow 
intensities. 
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i.40.Bb 



I. INTRODUCTION 

In the recent years, studies of transportation networks 
have drawn a substantial amount of attention in the 
physics community. The graphs derived from the physi- 
cal infrastructure of such networks were analyzed on the 
examples of a powergridH, |2| , a railway network [3, 13| , 
road networks 5, ,6||7|,ll,l3, pipeline network H or urban 
mass transportation systems |^ 0, 0, 0, Q| . These 
studies have one important feature in common - they fo- 
cus exclusively on the topology of the network, and they 
do not take into account the real-life traffic pattern. This 
makes the view very incomplete, because carrying traf- 
fic is the ultimate goal of every transportation system. 
Facing the lack of real-life traffic data, some authors try 
to estimate the traffic pattern based exclusively on the 
topology. Probably the most common load estimator 
is betweenness (used e.g., in |l5, 16, 1^ 0, 0, [20j). 
which assumes that each pair of nodes exchanges the 
same amount of traffic. But the real-life traffic patterns 
are in fact very heterogenous, both in space and traffic 
flow intensities. Therefore the most important nodes and 
edges from a topological point of view might not neces- 
sarily carry the most traffic. In we show that in 
typical transportation networks the correlation between 
the real load and the betweenness is very low. Therefore 
it is essential for some applications to know the real traf- 
fic pattern. 

Interestingly, the networks of traffic flows were studied 
separately, see the example of flows of people within a 
city 1221. and commuting traffic flows between different 
cities |2J|- These studies, in turn, neglect the underlying 
physical topology, making the analysis incomplete. For 
instance, it is impossible to detect the most loaded phys- 
ical edges, which might have a crucial meaning for the 
resilience of the system. A comprehensive view of the 
system often requires to analyze both layers (physical 
and traffic) together. 

Unfortunately, the data sets including both physical 
topology and traffic flows are rather sparse, and diffi- 
cult to get. In this paper we propose an approach to 
extract the physical structure and the network of traf- 



fic flows from timetables. Timetables of trains, buses, 
trams, metros and other means of mass transportation 
(henceforth called vehicles) are publicly available. They 
provide us with the available connections and their times. 
Timetables also contain the information about the phys- 
ical structure of the network and the traffic flows in it, 
but, as we show later, they often require a nontrivial pre- 
processing to be revealed. 



II. SPACES AND THE DIFFICULTY OF THE 
PROBLEM 

In order to position our contribution in the range of 
works in the fleld, we begin with a systematic deflnition 
of the topology of transportation systems. The set of 
nodes is defined by the set of all stations (train stations, 
bus stops, etc). It is not obvious, however, what should 
be interpreted as an edge. Its choice depends on what 
we want to be reflected by the topology of the physical 
graph. In the literature there are essentially three ap- 
proaches that deflne three different 'spaces': here we call 
them 'space-of-changes', 'space-of-stops' and 'space-of- 
-stations': 

In space— of— changes, two stations are considered to be 
connected by a link when there is at least one vehicle that 
stops at both stations. In other words, all stations used 
by a single vehicle are fully interconnected and form a 
clique. This approach neglects the physical distance be- 
tween the stations. Instead, in the resulting topology, the 
length of a shortest path between two arbitrary stations 
A and B is the number of changes of mean of transporta- 
tion one needs to get from A to B [s^- This approach 
was used in 0, 0, ; in the latter the authors used the 
term space P. 

In space— of— stops, two stations are connected if they 
are two consecutive stops on a route of at least one ve- 
hicle Here the length of a shortest path between 
two stations is the minimal number of stops one needs to 
make. Note that the number of stations traversed on the 
way might be larger, because the vehicles do not neces- 
sary stop on all of them. 
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FIG. 1: (Color online) An illustration of the transportation network topology in three spaces, (a) The routes of three vehicles. 
The route of Line 2 passes through node C on the way from B to D, but the vehicle does not stop there, (b) The topology in 
space-of-changes. Each route results in a clique. An edge is indicated by two colors, when it originates from two routes, but is 
merged into a single link, (c) The topology in space-of-stops. The "shortcut" B-D is a legitimate edge in this space, (d) The 
topology in space-of-stations. This graph reflects the topology of the real-life infrastructure. 



In space— of— stations, two stations are connected only 
if they are physically directly connected (with no station 
in between). This reflects the topology of the real-life in- 
frastructure. Here, the length of a shortest path between 
two stations is the minimal number of stations one has 
to traverse ( stop ping or not). This approach was used 

in miaiiiiii. 

In Fig. n we give an illustration of the three spaces. 
It is easy to see that the graph in space-of-stations is a 
subgraph of the graph in space-of-stops, which in turn 
is a subgraph of the graph in space-of-changes. 

The topologies in space-of-changes and space-of-stops 
can be directly obtained from timetables. In space-of- 
changes, for each vehicle, we fully connect all stations 
it stops at. Then we simplify the resulting graph by 
deleting multi-edges. In space-of-stops, we connect ev- 
ery two consecutive stops in routes of vehicles. As shown 
in Fig.^, the topology in space-of-stops can have short- 
cut links that do not exist in the real-life infrastructure. 
These shortcuts should be eliminated in the space-of- 
-stations topology, which makes it more challenging to 
obtain. To the best of our knowledge, the only work 
on extracting the real physical structure (the topology 
in space-of-stations) from timetables was done in the 
context of railway networks in the PhD dissertation of 
Annegret Lebers 0|. The proposed solution first ob- 
tains the physical graph in space-of-stops. Next, specific 
structures in the initial physical graph, called edge bun- 
dles, are detected. The Hamilton paths 40] within these 
bundles should indicate the real (non-shortcut) edges. 
Unfortunately, the bundle recognition problem turned 
out to be NP-complete. The heuristics proposed in 
result in a correct real/shortcut classification of 80% of 
edges in the studied graphs. The approach we propose 
in this paper is based on simple observations that were 



omitted in 24] . This results in a much simpler and more 
efl^ective algorithm. 



III. RELATED WORK 

Timetables have been used as a data source for a net- 
work construction in 0, However, the topologies 
obtained in these works were either in space-of-changes 
or in space-of-stops; neither of them reflected the real- 
life infrastructure. Moreover, the real traffic patterns 
were not considered in these studies. This is understand- 
able, because it is difficult to interpret a traffic flow in 
spaces of changes and stops. Does the "traffic" on a 
shortcut link have any physical meaning? We know that 
this traffic actually traverses other non-shortcut links 
that exist in reality. In contrast, in space-of-stations, 
the traffic flows have clear, unambiguous and natural in- 
terpretation. 

Another class of networks that can be constructed with 
the help of timetables are airport networks [6l l25ll2^ l27| . 
There, the nodes are the airports, and edges are the flight 
connections. The weight of an edge reflects the traffic on 
this connection, which can be approximated by the num- 
ber of flights that use it during one week. In this case, 
both the topology and the traffic information are explic- 
itly given by timetables. This is because the routes of 
planes are not constrained to any physical infrastructure, 
as opposed to roads for cars or rail-tracks for trains. So 
there are no "real" links and "shortcut" links. In a sense 
all links are real, and the topologies in space-of-stops 
and in space-of-stations actually coincide. 

Inferring the space-of-stations topology from timeta- 
bles becomes simple also in another special case, where 
the vehicles stop at each station they traverse (e.g.. 
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in many subway networks). This naturally eliminates 
the shortcuts, making the topologies in space-of-stops 
and stations identical. This is not true in a general case, 
with both local and express vehicles. 

In the reminder of this paper, we introduce necessary 
notation in Section HVl Next, in Section^we give an al- 
gorithm that extracts the real physical structure (a topol- 
ogy in space-of-stations) and the network of traffic flows 
from timetables. In Section IVll we test our algorithm on 
timetables of three large transportation networks at three 
different scales: city, country and continent. We also an- 
alyze the resulting physical topologies and compare them 
with those obtained by alternative approaches. Finally, 
in Section IVlTI we conclude the paper. 



IV. NOTATION 
A. Two layers 

We follow the two-layer framework introduced in |2l| . 
The lower-layer topology is called a physical graph = 
{V^, E"^), and the upper-layer topology is called a logical 
graph = {V^,E^). We assume that the sets of nodes 
at both layers are identical, i.e., = V'*', but as a 
general rule, we keep the indexes (j) and A to make the 
description unambiguous. Let TV = = jy"^! be the 
number of nodes. Every logical edge = {u^,v^} is 
mapped on the physical graph as a path M(e^) C 
connecting the nodes u'^ and v'^, corresponding to 
and w^. (A path is defined by the sequence of nodes it 
traverses.) The set of paths corresponding to all logical 
edges is called a mapping M{E^) of the logical topology 
on the physical topology. 

In the field of transportation networks the undirected, 
unweighted physical graph G"^ captures the topology of 
the physical infrastructure (i.e., in space-of-stations), 
and the weighted logical graph G'^ reflects the undirected 
traffic flows. Every logical edge e"^ is created by connect- 
ing the first and the last node of the corresponding traffic 
flow, and by assigning a weight w{e^) that represents the 
intensity of this flow. The mapping M{e^) of the edge 
is the path taken by this flow. 



B. Timetable data 

We take a list of all vehicles departing in the system 
within some period (e.g., one weekday). Denote by i? = 
{fi}i=i..\B\ the list of routes followed by these vehicles, 
where \R\ is the total number of vehicles. A route r.i 
of zth vehicle is defined by the list of nodes it traverses. 
Note that since there are usually more vehicles (than one) 
following the same path on one day, some of the routes 
may be identical. 



V. ALGORITHM 

The algorithm has three phases. In the first one, 
initialization, based on the set of routes i?, we create 
the set of nodes = and the physical topology 
^ttop — i^'^^^stop) ™ space-of-stops. In the second, 
main phase, the sets R and Ef^^^ are iteratively re- 
fined by detecting and erasing the shortcut links in the 
physical graph Cfj^p, resulting in the physical topology 

^stat — i^'^^^stat) ™ space-of-stations. Finally, in the 
third phase, we group the vehicles with identical routes, 
and obtain the logical graph G"^ and the mapping M {E^) 
of the logical edges on the physical graph Cfj^j^ . We de- 
scribe below each phase separately. 

A. Phase 1 - initialization 

In this phase we interpret every two consecutive nodes 
in any route ri € R as directly connected. Consequently, 
we connect these nodes with a link, which can be written 
as 

eLp= U ^(^') 

i=l..\R\ 

where E{ri) is the set of all pairs of adjacent nodes in 
(i.e., all edges in r^). This results in the physical topology 
Gttop = {y'^^Et^^p) in space-of-stops. 

B. Phase 2 - deleting shortcuts 

In this phase, at each iteration, we detect a shortcut in 
the set of physical edges, delete it, and update all routes 
ri that use this shortcut. Denote by e^j^^,e^2) the two 

end- nodes of e"^, and by Rev(Pe</.) the reversed version 
of Pg* (the sequence from the last node to the first one) . 
The algorithm is as follows: 

1 ptp — p4> 

^- ^stat ^stop 

2. Find a tuple {e'^,ri) such that e'^ is a shortcut for 
e^^j G r.i and e^2) ^ ri and e*^ ^ E{ri). 

3. IF no (e-^, r,) found THEN RETURN Et^^^ and R. 

4. Pg* := subpath of from e^j^^ to e^2) 

5. FOR aU r, e R DO: 

• If (e^^), e^2)) ^ THEN replace it with P^4. 

• If (e^2)' '=fi)) ^ ^3 THEN replace it with Rev{P^^) 

6. :=i?lA{en 

7. GOTO 2 



4 



In Step 2, we look for a physical link that is a short- 
cut. We declare a physical link e"^ to be a shortcut, if 
there exists a route e R, such that e"^ connects two 
nonconsecutive nodes in r^. For example, in Fig. 
e"^ = {B, D} is a shortcut because it connects two not 
neighboring nodes in the route ri of Line 1. If no physi- 
cal edge can be declared a shortcut, the algorithm quits 
in Step 3, returning Eff.^^ and R. Otherwise, in Step 4, 
we find the path P^4, that this shortcut should take. In 
Fig. ^ this path is Pe* = {B,C,D). In Step 5, we up- 
date the set of routes R by replacing every shortcut link 
e"^ in every route using it with the corresponding path 
Pf.4. . In our example, the updated route of Line 2 be- 
comes r2 = {A, B,C, D, E). It is thus identical to the 
route of Line 1. Finally, in Step 6 we delete the short- 
cut e"^ from the physical graph. We iterate these steps 
until no shortcut is found (Step 2). The resulting phys- 
ical graph Gft^t = (^''^Ef^^t) C Cftop, is a graph in 
space-of-stations. 

C. Phase 3 - grouping the same routes together 

Finally, based on the list R of routes updated in 
phase 2, we find groups of vehicles that follow the same 
path (in any direction). Each such group defines one 
edge e'^ in the logical graph; connects the first and 
the last node of the route. The number of vehicles that 
follow this route becomes the weight w{e^) of the logical 
edge e^; the route itself becomes the mapping M(e^) of 

on the physical graph. 
Denote by rii^first) , ra^iast) the first and the last nodes in 
r^, and by E{M{e^)) the set of all physical edges in the 
mapping of . Now, Phase 3 can be stated as follows: 

1. E^ ^ 0, M = 

2. FOR i = 1 TO \R\ DO: 

• = {^i(/irst), '"j(/ast)} 

• IF e E^ THEN ^(e,^) := u>(e^) + 1 

ELSE £;^ = S^U{e,^}, M{e^) = ri, w{e^)^l 

3. i?Lt-Ue^e£Ai?(M(e^)) 

In the example in Fig.^ after phase 2 the routes of Line 1 
and Line 2 become identical; therefore in phase 3 they 
are grouped together defining a logical edge = {A, E} 
with the weight i(;(e^) = 2 and the mapping M(e^) — 
{A,B,C,D,E). A second logical edge is = {F,H} 
with w(e^) = 1 and M(e^) = {F,B,G,H). 

D. Accuracy of the algorithm 

There are potential sources of mistakes and inaccura- 
cies in our approach. First, the links that we delete as 
being shortcuts, might actually exist in reality. However, 
a comparison of the results of our algorithm with the real 
maps (see Section IVl|) reveals very few differences, which 



means that this source of failures occurs very rarely in 
real data sets. 

A second problem lies in the estimation of the traffic 
pattern. Interpreting the routes of trains, buses, trams, 
metros, etc, as traffic flows gives us a picture at a low 
level of granularity. We view every vehicle as a traffic 
unit, regardless of its size or the number of people it car- 
ries. Moreover, people usually use these vehicles only on 
a portion of its total journey, not from the first to the last 
station. Clearly, the vehicle routes are the result of an 
optimization process that take into account many factors, 
such as people's demand, continuity of the path, travel- 
ing times and availability of stock. However, we believe 
that they reflect well the general direction and intensity 
of travels, and we take a vehicle as a basic traffic unit. 
After all, these are the vehicles that appear on the roads 
and cause traffic, not the people they transport. 



VI. A STUDY OF THREE REAL-LIFE 
NETWORKS 

In this section we apply our algorithm to extract the 
data from the timetables of three examples of transporta- 
tion networks, with sizes ranging from city to continent. 
As an example of a city, we take the mass transporta- 
tion system (buses, trams and metros) of Warsaw (WA), 
Poland; its timetables are available at j^^. At a country 
level, we study the railway network of Switzerland (CH) . 
Finally, we investigate the railway network formed by 
major trains and stations in most countries of central 
Europe (EU)|43]. The timetables of both CH and EU 
networks are available at l^^. The basic parameters of 
the data sets and of the resulting graphs can be found in 
Table n 

This section is organized as follows. First, we focus on 
a particular data set in order to study the performance of 
our algorithm. Next, we analyze and compare the phys- 
ical graphs originating from all three data sets in each 
of the considered spaces. Finally, we focus our attention 
on the logical graphs and traffic flows extracted by our 
algorithm. 



A. An example: The railway network of 
Switzerland (CH) 

As an illustration, let us consider more closely the 
railway network of Switzerland (CH). According to our 
timetable, on a typical weekday there are \R\ = 6957 
different trains that follow — 505 different routes 
(usually there is more than one train following the same 
route during one day). Our data contains N — 1613 
stations in Switzerland, together with their physical co- 
ordinates. In Fig. 121 we present the graphs obtained from 
this data set. The physical graphs in the three spaces 
are shown in Figs. [2jLbc. The graph in space-of-stations 
was obtained with the help of the algorithm introduced 
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(c) Physical graph G^'^^^j in space-of-stations 



(d) Real physical map 




(e) Logical graph 



FIG. 2: The railway network in Switzerland (CH). (a,b,c) Physical graphs in space-of-changes, stops and stations, respectively, 
(d) The real map of the rail tracks in Switzerland, (e) The logical graph. Every edge connects the first and the last station of 
a particular train route; its weight reflects the number of trains following this route in any direction. 
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2.1 
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changes 


88329 


36.4 
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3.7 


0.7347 


EU (Europe) 


2'081'000 


4853 


60775 


6703 


stops 


8600 


3.5 


48 


12.6 


0.3401 












stations 


5765 


2.4 


184 


50.9 


0.0129 



TABLE I: The studied datasets. "Area" is the surface occupied by the region covered by the network. A'^ is the number of 
nodes (stations/stops). \R\ is the total number of vehicles departing in the network during one weekday. \E^\ is the number of 
edges in the logical graph (number traffic flows); it is much smaller than |7?|, because the vehicles following the same route are 
grouped together in phase 3 of our algorithm. All the remaining parameters are computed for the physical graphs G"^: lE"^] is 
the number of edges, (fc*^) is the average node degree, d*^ stands for the diameter, is the average shortest path length, and 
c'* is the clustering coefficient. 



in the previous section. The number of vertices is the 
same in all three spaces. The number of edges in space- 
-of-changes, |i?changel ^ 19827, is much larger than in 
the other two spaces. Although at first sight the physical 
graphs in space-of-stations and in space-of-stops look 
comparable, the latter has a number of (nonexisting in 
reality) shortcut links. For a visual verification of cor- 
rectness of our algorithm, we show in Fig. [Sji the real 
map of the Swiss railway system; we observe only minor 
differences between (c) and (d). Finally, in Fig. we 
present the logical graph that reflects the traffic flows in 
the network. This graph is very heterogenous both in the 
weights of edges and in the layout of traffic. 



B. The physical graph in three spaces 

How does the choice of space affect the topology? We 
study in this section the physical graphs in the three 
spaces with respect to the basic metrics often used in the 
analysis of complex networks. 



fundamentally different from typical edges in space-of- 
-stations; they are shortcut links. It was shown in |30| 
that the diameter of a graph is very sensitive to the ex- 
istence of shortcuts. Even a relatively small number of 
shortcuts can dramatically bring down the diameter and 
the average shortest path length. We observe this phe- 
nomenon in our graphs. For instance, in the EU data 
set, the diameter drops about four times, from — 184 
in space-of-stations to 48 in space-of-stops. Similarly, 
the average shortest path length drops by roughly the 
same factor. Therefore, the shortcut edges, although 
not very numerous, play a very important role and make 
the graphs in space-of-stops very different from those in 
space-of-stations. 

This effect is not so strongly pronounced in the WA data 
set. The underlying reason is the relatively short length 
of shortcuts (usually 2 hops), which was shown to affect 
the diameter only to a small extent |3ll |. 

Finally, the graphs in space-of-changes have very small 
diameters and average shortest path lengths. This is 
mainly because of their high density (number of edges). 



Diameter d'*, and average shortest path length {l'^) 



2. Node degree k 



The average shortest path length (l) is computed over 
the lengths of shortest paths between all pairs of vertices. 
The diameter d is the longest of all shortest path lengths. 
These parameters are usually closely related. 

The diameters and average shortest path lengths of the 
graphs in space-of-stations are large, and scale roughly 
as -y/iV with the number of nodes N. This is typical of 
many planar, lattice- like infrastructure networks embed- 
ded in a two dimensional space. 

The graphs in space-of-stops have about 10 — 15% 
more edges than their counterparts in space-of-stations. 
The difference is not large, and one could possibly expect 
similar values of the diameter and the average short- 
est path length. However, these 10 — 15% edges are 



The node degree distributions in all three spaces are 
plotted in a semi-logarithmic scale in Fig. |3Kbc. Addi- 
tionally, for space-of-stops, we plot the degree distribu- 
tions in a log-log scale (Fig. |3i) , because it is not obvious 
which fit is better, exponential or power law (it was also 
pointed out in 13]). For the other two spaces we ob- 
serve a clear linear trend indicating the exponential be- 
havior. This was expected in space-of-stations, because 
the degree distribution of many infrastructure networks 
was shown to be narrow (here one decade) and to decay 
exponentially (see e.g., power lines in js^)- In space-of- 
stations the vast majority of nodes have degree equal to 
two, indicating long segments of stations without junc- 
tions. 
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FIG. 4: The lengths of original timetable routes {x axis) 
versus these lengths after the application of our algorithm 
{y axis). All three data sets are drawn in the same scale. 




(c) space— of— stations 



(d) space— of— stops, 
log-log scale 



FIG. 3: Node degree distributions in physical graphs in the 
three spaces, for the data sets WA, CH and EU. Plots (a-c) 
use a semi- logarithmic scale, plot (d) uses a log-log scale. If 
necessary, the data is lin-binned or log-binned, accordingly. 



3. Clustering coejficients c 

We have studied the clustering coefRcients c defined as 
a probability that two randomly chosen neighbors of a 
node are also direct neighbors of each other |30l| . 

The clustering coefhcient of topologies in space-of- 
changes are very high, which is a direct consequence of a 
very high density and existence of many cliques. What 
is more interesting is that in all three data sets, the clus- 
tering coefficient in space-of-stops is 1-2 orders of mag- 
nitude larger than in space-of-stations. As in the case 
of the graph diameter, here again the shortcut links turn 
out play a very important role in the topology. 



bersome. Therefore we restrict our analysis to the trafhc 
flows traversing the physical graph in space-of-stations. 

In Fig. 0] we compare the lengths of traffic flows before 
and after application of our algorithm. A new traffic 
flow can be either equal in length to the original one (if 
no shortcut was detected on its path), or longer. We 
observe that for all three data sets, there is a significant 
number of flows that become longer. In some cases this 
increase in length is by as much as 10 times. Generally, 
the longer the original flow is, the less extended it gets 
during a run of our algorithm. This is expected, because 
a long flow in a timetable usually corresponds to a local 
train that stops at all stations (i.e., uses no shortcuts). 

In Fig. 13 we present basic distributions measured for 
logical graphs in the three data sets. Recall that the 
edges in a logical graph reflect the traffic flows. There- 
fore, the node degree is the number of different con- 
nections starting/ending at the corresponding station 
(Fig. [S^). The strength of a node is the sum of the 
weights of neighboring edges 25] ; here it is the number of 
all connections starting/ending at this station (Fig. [SId). 
Finally, the weight w{e^) of a logical edge is the traffic 
flow intensity (Fig. O;). 

All three distributions arc heavily right-skewed meaning 
that there is a small number of nodes/edges with very 
high values of the observed parameter. We conclude that 
the real-life traffic patterns are very heterogenous, both 
in space (node degree and strength) and traffic flow in- 
tensities. This was shown in |21| to be the reason of 
high unpredictability of load distribution in transporta- 
tion networks. 



VII. CONCLUSIONS 



C. Traffic flows and the logical graph 

Now we turn our attention to the trafSc that flows in 
our networks. We extracted this scarce data with the 
help of the algorithm introduced in this paper. As we ar- 
gued before, the interpretation of traffic flowing through 
networks in space-of-changes and stops is rather cum- 



The knowledge of real-life traffic pattern is crucial in 
the analysis of transportation systems. This data is usu- 
ally much more difficult to get than the pure topology 
of a network. In this paper we have proposed an algo- 
rithm for extracting both the physical topology and the 
network of traffic flows from timetables of public mass 
transportation systems. We have applied our algorithm 
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FIG. 5: Properties of logical graphs, (a) Node degree distri- 
bution. Many nodes are isolated - they represent intermedi- 
ate stations on which no train starts or terminates its journey. 
The isolated nodes we represent here as having "degree" equal 
to 0.1. (b) Node strength distribution. (c) Edge weight 
(traffic flow intensities) distribution. All data are log-binned 
and plotted in a log-log scale. 



to three large transportation networks. This enabled us 
to make a systematic comparison between three differ- 



ent approaches (or "spaces") to construct a graph rep- 
resentation of a transportation network. The resulting 
physical topologies are very different. In particular, the 
seemingly similar graphs in space-of-stops and in space- 
of-stations, turn out to be very different in terms of basic 
graph-theory metrics such as diameter, average shortest 
path length, clustering coefficient and node degree dis- 
tribution. This is due to the existence of shortcut links 
in space-of-stops. Our algorithm detects and eliminates 
these shortcuts, and extracts the topology in space-of- 
-stations. Only this graph reffects the real-life physical 
infrastructure that is used by the traffic flows, gets con- 
gested or can be prone to failures or susceptible to at- 
tacks. In contrast, the edges in space-of-changes and 
in space-of-stops are somewhat "virtual," and the no- 
tion of traffic in these graphs is unclear, if at all makes 
any sense. What is important, the results are consistent 
across three different scales of the studied networks (city, 
country, continent). 

This work has several possible directions for the fu- 
ture. For instance, the knowledge of real traffic pattern 
allows us to revisit the error and attack tolerance [s^ 
of transportation systems, which might look completely 
different when focussing on traffic instead of on topol- 
ogy. Another direction would be to exploit additional 
information available in some timetables. For instance, 
in our data sets CH and EU, we also know the geograph- 
ical coordinates of the nodes. They fall therefore in the 
category of spatial networks that have been recently in- 
tensively studied ^ G, 9, 34, 35, 36]. In particular, we 
think that incorporating the real trafffc pattern in the 
models can help understanding the processes that gov- 
ern the evolution of spatial networks. 

Finally, we note that the data will be soon available 
at 113. 

The work presented in this paper was financially sup- 
ported by grant DIGS 1830 of the Hasler Foundation, 
Bern, Switzerland. 
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In this sense, a graph in space-of-changes is closely 
related to the dual interpretation of urban road net- 
works 0, 0, E3, where streets (of a given name) map 
to nodes, and intersections between streets map to links 
between the nodes. In a transportation network in space- 
-of-changes, the length of a shortest path is the number 
of changes of mean of transportation, whereas the length 
of a shortest path in a dual graph of a city is the number 
of changes of streets on the way from the starting point 
to destination. 

Hamilton path is a path that passes through every vertex 
of a graph exactly once 

In the EU data set, Paris has originally several stations 
that are not directly connected between each other. Fol- 
lowing the approach in Q , we merged them into one com- 
mon node. 



